Model-based clustering of mixed data with sparse dependence

نویسندگان

چکیده

Mixed data refers to a mixture of continuous and categorical variables. The clustering problem with mixed is long-standing statistical problem. latent Gaussian model, model-based approach for such problem, has received attention owing its simplicity interpretability. However, these approaches are prone dimensionality problems. Specifically, parameters must be estimated each group, the number covariance quadratic in To address this, we propose “regClustMD,” novel method that can sparse dependence among We consider assuming precision matrix between variables nonzero elements. maximizing penalized complete log-likelihood using Monte Carlo expectation-maximization (MCEM) algorithm. demonstrate our through simulation study real-world examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model based clustering for mixed data: clustMD

Amodel based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the lat...

متن کامل

Model-based Co-clustering for High Dimensional Sparse Data

We propose a novel model based on the von Mises-Fisher (vMF) distribution for coclustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the m...

متن کامل

Model-based clustering of Gaussian copulas for mixed data

Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate...

متن کامل

Clustering of Conceptual Graphs with Sparse Data

This paper gives a theoretical framework for clustering a set of conceptual graphs characterized by sparse descriptions. The formed clusters are named in an intelligible manner through the concept of stereotype, based on the notion of default generalization. The cognitive model we propose relies on sets of stereotypes and makes it possible to save data in a structured memory.

متن کامل

Mixture model clustering for mixed data with missing information

One di-culty with classi.cation studies is unobserved or missing observations that often occur in multivariate datasets. The mixture likelihood approach to clustering has been well developed and is much used, particularly for mixtures where the component distributions are multivariate normal. It is shown that this approach can be extended to analyse data with mixed categorical and continuous at...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3296790